142 research outputs found
Measuring Social Biases in Grounded Vision and Language Embeddings
We generalize the notion of social biases from language embeddings to
grounded vision and language embeddings. Biases are present in grounded
embeddings, and indeed seem to be equally or more significant than for
ungrounded embeddings. This is despite the fact that vision and language can
suffer from different biases, which one might hope could attenuate the biases
in both. Multiple ways exist to generalize metrics measuring bias in word
embeddings to this new setting. We introduce the space of generalizations
(Grounded-WEAT and Grounded-SEAT) and demonstrate that three generalizations
answer different yet important questions about how biases, language, and vision
interact. These metrics are used on a new dataset, the first for grounded bias,
created by augmenting extending standard linguistic bias benchmarks with 10,228
images from COCO, Conceptual Captions, and Google Images. Dataset construction
is challenging because vision datasets are themselves very biased. The presence
of these biases in systems will begin to have real-world consequences as they
are deployed, making carefully measuring bias and then mitigating it critical
to building a fair society
Learning a natural-language to LTL executable semantic parser for grounded robotics
Children acquire their native language with apparent ease by observing how
language is used in context and attempting to use it themselves. They do so
without laborious annotations, negative examples, or even direct corrections.
We take a step toward robots that can do the same by training a grounded
semantic parser, which discovers latent linguistic representations that can be
used for the execution of natural-language commands. In particular, we focus on
the difficult domain of commands with a temporal aspect, whose semantics we
capture with Linear Temporal Logic, LTL. Our parser is trained with pairs of
sentences and executions as well as an executor. At training time, the parser
hypothesizes a meaning representation for the input as a formula in LTL. Three
competing pressures allow the parser to discover meaning from language. First,
any hypothesized meaning for a sentence must be permissive enough to reflect
all the annotated execution trajectories. Second, the executor -- a pretrained
end-to-end LTL planner -- must find that the observe trajectories are likely
executions of the meaning. Finally, a generator, which reconstructs the
original input, encourages the model to find representations that conserve
knowledge about the command. Together these ensure that the meaning is neither
too general nor too specific. Our model generalizes well, being able to parse
and execute both machine-generated and human-generated commands, with
near-equal accuracy, despite the fact that the human-generated sentences are
much more varied and complex with an open lexicon. The approach presented here
is not specific to LTL: it can be applied to any domain where sentence meanings
can be hypothesized and an executor can verify these meanings, thus opening the
door to many applications for robotic agents.Comment: 10 pages, 2 figures, Accepted in Conference on Robot Learning (CoRL)
202
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity
The unprecedented photorealistic results achieved by recent text-to-image
generative systems and their increasing use as plug-and-play content creation
solutions make it crucial to understand their potential biases. In this work,
we introduce three indicators to evaluate the realism, diversity and
prompt-generation consistency of text-to-image generative systems when prompted
to generate objects from across the world. Our indicators complement
qualitative analysis of the broader impact of such systems by enabling
automatic and efficient benchmarking of geographic disparities, an important
step towards building responsible visual content creation systems. We use our
proposed indicators to analyze potential geographic biases in state-of-the-art
visual content creation systems and find that: (1) models have less realism and
diversity of generations when prompting for Africa and West Asia than Europe,
(2) prompting with geographic information comes at a cost to prompt-consistency
and diversity of generated images, and (3) models exhibit more region-level
disparities for some objects than others. Perhaps most interestingly, our
indicators suggest that progress in image generation quality has come at the
cost of real-world geographic representation. Our comprehensive evaluation
constitutes a crucial step towards ensuring a positive experience of visual
content creation for everyone
FACET: Fairness in Computer Vision Evaluation Benchmark
Computer vision models have known performance disparities across attributes
such as gender and skin tone. This means during tasks such as classification
and detection, model performance differs for certain classes based on the
demographics of the people in the image. These disparities have been shown to
exist, but until now there has not been a unified approach to measure these
differences for common use-cases of computer vision models. We present a new
benchmark named FACET (FAirness in Computer Vision EvaluaTion), a large,
publicly available evaluation set of 32k images for some of the most common
vision tasks - image classification, object detection and segmentation. For
every image in FACET, we hired expert reviewers to manually annotate
person-related attributes such as perceived skin tone and hair type, manually
draw bounding boxes and label fine-grained person-related classes such as disk
jockey or guitarist. In addition, we use FACET to benchmark state-of-the-art
vision models and present a deeper understanding of potential performance
disparities and challenges across sensitive demographic attributes. With the
exhaustive annotations collected, we probe models using single demographics
attributes as well as multiple attributes using an intersectional approach
(e.g. hair color and perceived skin tone). Our results show that
classification, detection, segmentation, and visual grounding models exhibit
performance disparities across demographic attributes and intersections of
attributes. These harms suggest that not all people represented in datasets
receive fair and equitable treatment in these vision tasks. We hope current and
future results using our benchmark will contribute to fairer, more robust
vision models. FACET is available publicly at https://facet.metademolab.com
Study of Natural Health Product Adverse Reactions (SONAR): Active Surveillance of Adverse Events Following Concurrent Natural Health product and Prescription Drug Use in Community Pharmacies
Background: Many consumers use natural health products (NHPs) concurrently with prescription medications. As NHP-related harms are under-reported through passive surveillance, the safety of concurrent NHP-drug use remains unknown. To conduct active surveillance in participating community pharmacies to identify adverse events related to concurrent NHP-prescription drug use. Methodology/Principal Findings: Participating pharmacists asked individuals collecting prescription medications about (i) concurrent NHP/drug use in the previous three months and (ii) experiences of adverse events. If an adverse event was identified and if the patient provided written consent, a research pharmacist conducted a guided telephone interview to gather additional information after obtaining additional verbal consent and documenting so within the interview form. Over a total of 112 pharmacy weeks, 2615 patients were screened, of which 1037 (39.7%; 95% CI: 37.8% to 41.5%) reported concurrent NHP and prescription medication use. A total of 77 patients reported a possible AE (2.94%; 95% CI: 2.4% to 3.7%), which represents 7.4% of those using NHPs and prescription medications concurrently (95%CI: 6.0% to 9.2%). Of 15 patients available for an interview, 4 (26.7%: 95% CI: 4.3% to 49.0%) reported an AE that was determined to be “probably” due to NHP use. Conclusions/Significance: Active surveillance markedly improves identification and reporting of adverse events associated with concurrent NHP-drug use. Although not without challenges, active surveillance is feasible and can generate adverse event data of sufficient quality to allow for meaningful adjudication to assess potential harms
Family composition and age at menarche: findings from the international Health Behaviour in School-Aged Children Study
This research was funded by The University of St Andrews and NHS Health Scotland.Background Early menarche has been associated with father absence, stepfather presence and adverse health consequences in later life. This article assesses the association of different family compositions with the age at menarche. Pathways are explored which may explain any association between family characteristics and pubertal timing. Methods Cross-sectional, international data on the age at menarche, family structure and covariates (age, psychosomatic complaints, media consumption, physical activity) were collected from the 2009–2010 Health Behaviour in School-aged Children (HBSC) survey. The sample focuses on 15-year old girls comprising 36,175 individuals across 40 countries in Europe and North America (N = 21,075 for age at menarche). The study examined the association of different family characteristics with age at menarche. Regression and path analyses were applied incorporating multilevel techniques to adjust for the nested nature of data within countries. Results Living with mother (Cohen’s d = .12), father (d = .08), brothers (d = .04) and sisters (d = .06) are independently associated with later age at menarche. Living in a foster home (d = −.16), with ‘someone else’ (d = −.11), stepmother (d = −.10) or stepfather (d = −.06) was associated with earlier menarche. Path models show that up to 89% of these effects can be explained through lifestyle and psychological variables. Conclusions Earlier menarche is reported amongst those with living conditions other than a family consisting of two biological parents. This can partly be explained by girls’ higher Body Mass Index in these families which is a biological determinant of early menarche. Lower physical activity and elevated psychosomatic complaints were also more often found in girls in these family environments.Publisher PDFPeer reviewe
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
We present CM3Leon (pronounced "Chameleon"), a retrieval-augmented,
token-based, decoder-only multi-modal language model capable of generating and
infilling both text and images. CM3Leon uses the CM3 multi-modal architecture
but additionally shows the extreme benefits of scaling up and tuning on more
diverse instruction-style data. It is the first multi-modal model trained with
a recipe adapted from text-only language models, including a large-scale
retrieval-augmented pre-training stage and a second multi-task supervised
fine-tuning (SFT) stage. It is also a general-purpose model that can do both
text-to-image and image-to-text generation, allowing us to introduce
self-contained contrastive decoding methods that produce high-quality outputs.
Extensive experiments demonstrate that this recipe is highly effective for
multi-modal models. CM3Leon achieves state-of-the-art performance in
text-to-image generation with 5x less training compute than comparable methods
(zero-shot MS-COCO FID of 4.88). After SFT, CM3Leon can also demonstrate
unprecedented levels of controllability in tasks ranging from language-guided
image editing to image-controlled generation and segmentation
Trends in the perceived body size of adolescent males and females in Scotland, 1990–2014: changing associations with mental well-being
Objectives: This paper explores trends in Scottish adolescents’ body size perceptions and associated mental well-being outcomes. Methods: Data were collected on Scottish 11, 13 and 15-year olds by the Health Behaviour in School-aged Children study between 1990 and 2014 (n=42,312). Logistic regression was used to examine changes in the prevalence of over- and underweight perceptions. Ordinal and linear regression was used to examine changes in the association between body perception and mental well-being. Results: Little change was observed in over- or under-weight perceptions between 1990 and 2014. However, relative to those perceiving their body as ‘about right’, those perceiving themselves as overweight reported decreasing confidence (all groups), decreasing happiness (11- and 13-year old girls) and increasing psychological symptoms (all girls and 15 year-old boys). Perceived underweight is associated with poor well-being, especially in males, but we present little evidence that this is a recent phenomenon. Conclusions: We present evidence suggesting that the influence of body image on adolescent mental health is increasing over time. This may play a role in the recently observed worsening of mental well-being in Scottish adolescents.Publisher PDFPeer reviewe
- …